New hard-thresholding rules based on data splitting in high-dimensional imbalanced classification
نویسندگان
چکیده
In binary classification, imbalance refers to situations in which one class is heavily under-represented. This issue due either a data collection process or because indeed rare population. Imbalanced classification frequently arises applications such as biology, medicine, engineering, and social sciences. this paper, for the first time, we theoretically study impact of sizes on linear discriminant analysis (LDA) high dimensions. We show that scarcity class, referred minority high-dimensionality feature space, LDA ignores yielding maximum misclassification rate. then propose new construction hard-thresholding rules based splitting technique reduces large difference between rates. proposed method asymptotically optimal. further two well-known sparse versions imbalanced cases. evaluate finite-sample performance different methods using simulations by analyzing real sets. The results our outperforms its competitors has comparable much smaller subset selected features, while being computationally more efficient.
منابع مشابه
On Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کاملon mining fuzzy classification rules for imbalanced data
fuzzy rule-based classification system (frbcs) is a popular machine learning technique for classification purposes. one of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. however many cases the minority classes are more important than the majority ones. in this paper, we have extended ...
متن کاملdata mining rules and classification methods in insurance: the case of collision insurance
assigning premium to the insurance contract in iran mostly has based on some old rules have been authorized by government, in such a situation predicting premium by analyzing database and it’s characteristics will be definitely such a big mistake. therefore the most beneficial information one can gathered from these data is the amount of loss happens during one contract to predicting insurance ...
15 صفحه اولClassification of High Dimensional and Imbalanced Hyperspectral Imagery Data
The present paper addresses the problem of the classification of hyperspectral images with multiple imbalanced classes and very high dimensionality. Class imbalance is handled by resampling the data set, whereas PCA is applied to reduce the number of spectral bands. This is a preliminary study that pursues to investigate the benefits of using together these two techniques, and also to evaluate ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronic Journal of Statistics
سال: 2022
ISSN: ['1935-7524']
DOI: https://doi.org/10.1214/21-ejs1939